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Abstract 

A robust and efficient anomaly detection technique is pro- 
posed, capable of dealing with crowded scenes where tra- 
ditional tracking based approaches tend to fail. Initial fore- 
ground segmentation of the input frames confines the analy- 
sis to foreground objects and effectively ignores irrelevant 
background dynamics. Input frames are split into non- 
overlapping cells, followed by extracting features based on 
motion, size and texture from each cell. Each feature type 
is independently analysed for the presence of an anomaly. 
Unlike most methods, a refined estimate of object motion 
is achieved by computing the optical flow of only the fore- 
ground pixels. The motion and size features are modelled by 
an approximated version of kernel density estimation, which 
is computationally efficient even for large training datasets. 
Texture features are modelled by an adaptively grown code- 
book, with the number of entries in the codebook selected 
in an online fashion. Experiments on the recently published 
UCSD Anomaly Detection dataset show that the proposed 
method obtains considerably better results than three recent 
approaches: MP PC A, social force, and mixture of dynamic 
textures (MDT). The proposed method is also several or- 
ders of magnitude faster than MDT, the next best perform- 
ing method. 



1. Introduction 

Automated detection of anomalous events in video feeds 
has the potential to provide more vigilant surveillance, pos- 
sibly in lieu of, or as an assistance to, human operators 
who have limited attention spans when faced with tedious 
tasks [13]. Qualifying an event as anomalous is subjective 
and depends on the intended application as well as context. 
However, without being application or context specific, an 
anomalous event can be defined as any event that is different 
from what has been observed beforehand. 

Detection of anomalous events can be hence viewed as 
a binary classification problem, where there are training ex- 
amples only for one class, generally. Typical algorithms 
model the dynamics of 'normal' activity or 'expected' be- 
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haviour and compare new observations with an existing 
model. Any outliers are labelled as anomalous. An ideal 
system is expected not only to detect anomalous events ac- 
curately, but also to adapt itself to the changes witnessed in 
the environment over time. 

Several anomaly detection techniques have been pro- 
posed in various research fields. Chandola et al. [10] discuss 
them in detail in their survey, while Saligrama et al. [26] 
examine video based anomaly detections approaches in the 
context of surveillance. Existing methods in the literature 
can be roughly placed into two categories: (i) analysis by 
tracking, where trajectories of individual objects are main- 
tained; (ii) analysis without tracking, where other features 
such as motion and texture are employed to model activity 
patterns of a given scene. 

In the first category, almost all approaches use tracking 
information directly to gather object speed and direction, 
and indirectly as an aid in determining features such as the 
size and aspect ratio of objects [4, 14, 21, 24, 31]. While 
trajectory based approaches are suitable for cases where the 
scene is comprised of only a few objects, in crowded en- 
vironments it is difficult to reliably maintain tracks due to 
occlusion and overlap of objects [16, 19]. 

In light of the above problems, in the second category 
the anomaly detection task is formulated while deliberately 
omitting the tracking of specific objects. Most approaches 
in this category largely rely on motion or motion-related 
features. For example, Mehran et al. [20] model crowd be- 
haviour using a "social force" model, where the interaction 
forces are computed using optical flow. Adam et al. [1] 
model optical flow at a set of fixed spatial locations using 
probabilistic histograms. Ermis et al. [11] propose using 
busy-idle rates of each pixel to detect abnormal behaviour. 

As the above techniques solely rely on motion informa- 
tion, anomalies occurring due to object size or appearance 
may not be detected. To address this limitation, Mahadevan 
et al. [19] recently proposed to jointly model the appear- 
ance and dynamics of crowded scenes, using mixtures of 
dynamic textures (MDT) [8]. The method explicitly inves- 
tigates both temporal and spatial anomalies. Though the re- 
ported comparative results show improvements over earlier 
techniques, the method's main drawback is heavy compu- 
tation. Evaluating a frame of size 240 x 160 takes about 25 
seconds (ie. 2.4 frames per minute). 



In this paper, we present a robust anomaly detection al- 
gorithm with relatively low complexity, targeted primarily 
for crowded scenes where traditional tracking based ap- 
proaches tend to fail. To suppress undesirable background 
dynamics, such as waving trees and illumination variations, 
we perform foreground segmentation and retain only fore- 
ground objects for further analysis. Each input frame is 
split into non-overlapping cells (small regions). Based on 
each frame's foreground mask, the relevant cells are anal- 
ysed for the presence of an anomaly. Unlike most methods, 
a refined estimate of motion is achieved by computing the 
optical flow only for the foreground pixels. In addition to 
motion, the proposed method analyses the size and texture 
of foreground objects at each cell location. 

Motion, size and texture are modelled separately. Inde- 
pendent analysis helps to keep computation efficient, and 
allows for inferring the nature of the anomaly (eg. speed 
violation, lack of motion, size too large, etc). Each cell is 
labelled as either normal or anomalous after combining the 
outputs of multiple classifiers (one for each feature type). 

We continue the paper as follows. In Section 2 the pro- 
posed algorithm is described in detail. Performance evalua- 
tion and comparison with three recent algorithms is given in 
Section 3. The main findings and possible future directions 
are presented in Section 4. 

2. Proposed Algorithm 

The proposed method has four main components: 

1 . Feature extraction, where input images are split into non- 
overlapping spatial regions, termed cells, and features 
are extracted based on motion, size and texture of the 
foreground objects contained in the cells. 

2. Model estimation, which models the normal dynamics 
witnessed at each cell location. There are separate mod- 
els for each feature type. 

3. Classification of each cell as anomalous or normal, 
where each cell is sequentially checked for normality by 
up to two classifiers. As soon as the first classifier deems 
that the cell is anomalous, the second classifier is not 
consulted. In order of processing, the two classifiers are: 

(a) Speed check, where the likelihood of the magnitude of mo- 
tion of any foreground objects is evaluated. 

(b) Size and texture check, where first the likelihood of the 
size of a foreground object is evaluated. Cells with low 
likelihoods (suggesting an anomaly) are further analysed 
according to their texture, in order to validate the presence 
of the anomaly. 

4. Spatio-temporal post-processing, to minimise isolated 
random noise present in the generated anomaly masks. 

Each of the components is explained in detail in the follow- 
ing sections. 



2.1. Feature Extraction 

Let the resolution of the greyscale image sequence X be 
W x H. Each image is split into non-overlapping cells 
(regions) of size N x N, with the cell located at i and j 
denoted by C(i,j). The cell co-ordinates have the range 
of i = 1, 2, • • • , (W/N) and j = 1, 2, • • • , (H/N). Let I t be 
the frame at time instant t and let its corresponding cells be 
denoted by C t (i,j). 

In order to restrict the analysis to regions of interest 
and to filter out distractions (eg. waving trees, illumina- 
tion changes, etc), we perform foreground segmentation on 
each incoming frame. We have used the method proposed 
in [23], due to its robustness, high-quality foreground masks 
and the ability to estimate the background even in the pres- 
ence of multiple moving foreground objects. Alternative 
techniques for estimating the background in crowded scenes 
include [3, 22]. 

For each cell, we extract features based on motion, size 
and texture. The foreground masks are referenced while 
computing the features. The details of the three features are 
given below. 

2.1.1 Motion 

To estimate the motion associated with cell C t (i,j), we 
compute the optical flow of only the foreground pixels. The 
iterative Lucas-Kanade algorithm [7, 18] is employed to 
compute the displacement of pixels between two consecu- 
tive frames, with a fixed search window around each pixel. 
We first calculate the average motion associated with cell 
C t (i,j) using: 
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where, for foreground pixel n, v^ and v^ are the optical 
flows in the x and y directions, respectively, while N f is the 
total number of foreground pixels within the cell. 

The motion feature for cell C t (i,j) is taken to be the 
smoothed (noise-reduced) version of the cell's average mo- 
tion, calculated using straightforward temporal averaging: 
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2.1.2 Size 



Relying on motion alone for anomaly detection might be 
insufficient, as certain anomalies may exhibit motion that is 
considered as normal (eg. a slow moving vehicle on a path 
designated for pedestrians). Furthermore, motion estima- 
tion techniques can suffer from the aperture problem [30]. 
To increase the sensitivity of anomaly detection, the size of 
foreground objects can be analysed. 

A common technique to measure object size is via con- 
nected component analysis on the foreground masks. How- 
ever, in crowded environments it becomes ineffective due 



to object overlap and occlusion. Instead, an approximate 
size of an object contained within a cell can be obtained by 
considering its foreground occupation along with that of its 
neighbouring cells (as the object may occupy more than one 
cell). An example is shown in Fig. 1. 

Specifically, let us denote the foreground occupancy 
(number of foreground pixels) for cell C t (i,j) by o t (i,j). 
We define the size feature for cell C t (i,j) as a weighted 
combination of the foreground occupancy values of the cell 
and its immediate neighbours: 

i+l J+l 

size t (ij)= ^2 ^2 G(a-i + l, b-j + l)o t (a,b) (3) 
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where G is a 3 x 3 Gaussian mask [12]. The mask is used for 
placing prominence on the center cell and hence reducing 
the impact of neighbouring cells that, in crowded scenarios, 
may contain foreground pixels belonging to other objects 
(in addition to the object of interest). 

2.1.3 Texture 

While the size feature can be useful for increasing the sen- 
sitivity of anomaly detection, using it without qualifica- 
tion may also increase the false alarm rate. For example, 
in crowded environments the foreground masks of people 
walking close to each other could resemble a large fore- 
ground object. 

To address this problem, the texture present within the 
cell can be used for increasing selectivity. To this end, we 
filter a given image using 2D Gabor wavelets [17] at four 
orientations: 0, 45, 90 and 135 degrees. The texture descrip- 
tor for cell Ct (i,j) is hence a 4D vector: 

txt t (ij) = [m 77145 77190 777135]' (4) 

where me is the sum of the response magnitudes of the 
wavelet oriented at degrees, over the pixels contained 
within the cell. The texture vectors are only collected for 
cells that have at least one foreground pixel, in order to min- 
imise modelling of the background. 
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Figure 1. (a) the approximate size of the foreground object con- 
tained in the center cell C(i,j) is computed by considering fore- 
ground pixels in the cell as well as its neighbouring cells; (b) an 
example of a foreground object appearing in the center cell and its 
neighbours; (c) the corresponding foreground mask, found via an 
automatic foreground segmentation algorithm. 



2.2. Scalable Semi-Parametric Model Estimation 

Surveillance scenarios include platforms at train/bus 
stations, buildings (both indoors/outdoors) as well as 
road/walkway traffic. In all these scenarios, even normal 
day-to-day events have inherent variations that are random 
in nature. For example, the speed of vehicles on a road can 
vary arbitrarily due to traffic light signals and congestion. 
Furthermore, the dynamics of the scene can keep changing 
over time. As such, parametric approaches are unlikely to 
be effective for modelling distributions of features in these 
scenarios, as the number of modes is unlikely to be reli- 
ably known beforehand. With this in mind, we employed 
a semi-parametric modelling approach, which can be used 
for modelling arbitrary distributions without using any as- 
sumptions on the forms of the underlying densities. 

We model each cell by considering only the features ex- 
tracted from that particular cell location, along the tempo- 
ral axis. As described in Section 2.1, there are three fea- 
ture types: motion (a scalar), size (also a scalar) and a tex- 
ture descriptor (a 4D vector). In tasks such as object de- 
tection/recognition, it is often argued that joint modelling 
of features yields better results than modelling them inde- 
pendently [2, 28]. However, in our context such an ap- 
proach may fail to detect outliers (ie. the anomalies) ac- 
curately, due to the implicit mutual influence exhibited by 
the features in the decision process. Furthermore, the dy- 
namics of a crowded scene are inherently arbitrary in na- 
ture, which may render joint modelling ineffective. We will 
hence model these features separately. Independent analy- 
sis keeps computation efficient, examines each feature pre- 
cisely for anomalies, which in turn allows for inferring the 
nature of the anomaly (eg. speed violation, lack of motion, 
size too large, etc). 

For modelling the motion and size features, a straightfor- 
ward and efficient semi-parametric approach involves con- 
structing normalised histograms. The training data can be 
discarded once the histograms are built. However, a ma- 
jor problem with this approach is the presence of sharp dis- 
continuities in the estimated densities due to binning, rather 
than that of the underlying distribution that generated the 
data [6]. To overcome the above limitation, it is possi- 
ble to use kernel density estimation techniques that result 
in smoother probability density functions [6, 25]. Their 
training phase only involves storing of all the data samples. 
However, their drawback is increased computational cost 
and memory requirements as the dataset becomes larger [6]. 
As such, these techniques can suffer from scalability issues. 

To achieve a better trade-off between accuracy and com- 
putational requirements, we create a smoothed histogram by 
temporarily storing all the training samples and performing 
Gaussian kernel based density estimation to compute the 
probability of the continuous variable (ie. motion or size) 
only at discrete points over its entire permissible range. 



In effect, we assume the random variable to be discrete and 
compute its probability at fixed points using: 
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where Ax is the resolution of the step size (eg. 0.25), 
s = {0, 1, 2, 3, ..., S}, with SAx being the valid upper limit 
of the variable in consideration. N is the number of samples 
in the training dataset and h is the bandwidth of the Gaus- 
sian kernel. The probability values are normalised to obtain 
a probability mass function (pmf). As in the histogram ap- 
proach, the training data is discarded once the pmf is com- 
puted. The resultant pmf is denoted by p(-). 

2.2 .1 Adaptive Modelling of Texture Descriptors 

While the above approach is effective for modelling the dis- 
tribution of scalars (motion and size in our case), using it 
to model the distribution of the 4D texture descriptors is 
impractical, as the number of resulting discrete samples re- 
quired to cover the entire feature space (for just one cell 
location) would be quite large. For example, having only 20 
equally spaced points in each dimension would generate 20 4 
points in 4D space. As there is a non-trivial number of cell 
locations, the total storage costs would be hence prohibitive. 

Furthermore, the above density estimation approach im- 
plicitly relies on an Euclidean based distance, which will 
be affected by variations in texture contrasts, rather than 
purely measuring the differences between texture patterns. 
For example, the texture descriptor will exhibit low magni- 
tude responses when the intensity of a pedestrian's clothing 
is similar to that of the background, and high magnitude re- 
sponses when the intensity is contrasting to the background. 

To address the storage problem, for each cell location 
we model the distribution of the texture descriptors using 
a codebook that is trained in an online fashion (adaptively 
grown), inspired by [16]. To address the distance measure 
problem, we employ Pearson's correlation coefficient [29] 
for measuring the similarity of two descriptors: 
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where p x is the mean of the elements of vector x, and 
p(o,b)6[-l,+l]. 

The codebook is built as follows. Initially, the first train- 
ing vector is taken to be the first entry in the codebook. 
Each of the remaining vectors is sequentially treated as a 
new observation, cc, which is compared to each entry in the 
codebook, Ck, using Eqn. (6). If, for the best matching Ck, 
p(x, Ck) > 0.9, the k-th entry is updated using [9]: 

new old . *■ ( old \ /r-,x 

Ck = Ck + wv=TI r " Cfc ) (7) 

where Wk is the number of texture vectors associated so far 
with entry k. If \/k p(x, Ck) < 0.9, vector x is appended to 
the codebook, thereby expanding it. 



2.3. Cell Classification 

Each cell is sequentially checked whether it is anoma- 
lous by up to two classifiers. As soon as the first classifier 
deems that the cell is anomalous, the second classifier is not 
consulted. Specifically, given a decision threshold T, cell 
C t (i, j) is classified as anomalous if either of the following 
two conditions are satisfied: 

(a) i? m ot (mot t (iJ)) < T 

(b) Size (size t (ij)) < T and p max (txt t (ij)) < 0.9 

where p mo t(-) and p S { ze (-) are the pmfs calculated in Sec- 
tion 2.2, while p max (txt t (z,j)) = max p(txt t (i, j),Ck), 

k 

ie. the correlation coefficient of the closest matching code- 
book entry. 

Condition (a) is effectively a speed check, where speeds 
that are either slower or faster than 'normal' are detected 
(note that p mo t(-) can define several 'normal' speeds). In 
condition (b) both the size and texture are checked. As the 
size feature alone is not be able to distinguish between a 
large object and a collection of small objects (eg. a crowd 
of people), the texture feature is in effect used to verify the 
presence of an anomaly indicated by the size feature. 

The texture feature is only calculated for cells that con- 
tain foreground pixels, in order to avoid modelling the back- 
ground (which might be dynamic). As such, the texture 
feature is suitable for distinguishing among foreground ob- 
jects. However, as the feature may end up capturing irrele- 
vant textures when the cell contains only thin edges of the 
foreground, it is used in combination with the size feature 
rather than being used alone. 

2.4. Spatio-Temporal Post-Processing 

To minimise spurious and intermittent false alarms, 
spatio-temporal post-processing is performed on the 
anomaly masks generated by the classification procedure 
in Section 2.3. If a cell at time t was initially classified 
as anomalous, we consider its immediate neighbours along 
both the spatial and temporal axes (see Fig. 2). If at least 
two cells in each plane (ie. £-1, t, £+1) were classified as 
anomalous, we assume that the cell in question was cor- 
rectly classified as anomalous. Otherwise, it is re-classified 
as being normal (ie. non-anomalous). 




Figure 2. For a cell initially classified as anomalous (marked in 
red), its immediate neighbours along both the spatial and tempo- 
ral axes are consulted to verify whether the cell was classified as 
anomalous due to noise. 
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Figure 3. Examples of anomaly detection and localisation via the 
proposed method (highlighted in red). Results are shown on the 
(a) Pedl and (b) Ped2 subsets of the UCSD Anomaly dataset. 

3. Experiments 

To appraise the performance of the proposed approach, 
we performed experiments on the recently released UCSD 
Anomaly Detection dataset [19]. The dataset contains mul- 
tiple surveillance videos of two scenes (Pedl and Ped2), 
both with considerable crowds. Anomalies present in the 
dataset include: skateboarders, bikers, motor vehicles, peo- 
ple pushing carts as well as walking on the lawn. The image 
size in Pedl is 238 x 158 pixels, while on Ped2 it is 360 x 240. 
Pedl has 34 training and 36 test image sequences, while 
Ped2 has 16 training and 12 test image sequences. Exam- 
ples are shown in Fig. 3. 

The UCSD dataset has a prescribed evaluation proto- 
col [19], involving two types of evaluations: (i) frame-level 
anomaly detection, and (ii) within-frame anomaly localisa- 
tion. For frame-level anomaly detection, all test sequences 
have annotated groundtruth at frame-level in the form of a 
binary flag indicating the presence or absence of anomaly in 
each frame. For within-frame anomaly localisation, a subset 
of test sequences (10 in Pedl and 9 in Ped2) has the anoma- 
lous regions within each frame marked. If at least 40% of 
detected pixels (belonging to a detected anomaly) match the 



Table 1. Equal error rates (EERs) for frame-level anomaly detec- 
tion, obtained on the Pedl and Ped2 subsets on the UCSD dataset. 
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Table 2. EERs fox pixel-level anomaly localisation. 

ground-truth pixels, it is presumed the anomaly has been lo- 
calised correctly; otherwise it is treated as a 'miss'. 

The proposed algorithm was compared with methods 
based on social force [20], MPPCA [15], and mixture of dy- 
namic textures (MDT) [19]. The first two methods rely on 
features obtained from optical flow while the last approach 
employs features based on appearance and scene dynamics. 
The quantitative results of the above three algorithms were 
adapted from [19]. To aid the interpretation of the results, 
we have reported the false negative rate [5] instead of the 
true positive rate used in [19]. 

Based on preliminary experiments, the cell size was set 
to 16 x 16, while the search window size in the optical 
flow computation (Sec. 2.1.1) was set to 15 x 15 (odd- 
sized to ensure a symmetrical search area around a given 
pixel). The experiments were implemented with the aid of 
the Armadillo C++ library [27]. 

The quantitative results for the frame-level evaluation 
are shown in Table 1 and in Fig. 4(a)-(b). The results for 
the within-frame evaluation are shown in Table 2 and in 
Fig. 4(c). Some of the qualitative results obtained by the 
proposed method are shown in Fig. 3. In Tables 1 and 2, 
the equal error rate (EER) is the point where the false neg- 
ative rate is equal to the false positive rate. At the EER, the 
proposed method outperforms the other methods at both the 
frame-level and within-frame evaluations, most notably on 
the anomaly localisation task. 

An experimental implementation of the proposed algo- 
rithm in C++ yielded 12 fps (720 frames per minute) on a 
standard 3 GHz PC, for sequences of images with a size of 
240 x 160 (ie. processing the Pedl subset). We note that this 
is several orders of magnitude faster than the MDT method, 
which takes 25 seconds to process each frame (ie. 2.4 frames 
per minute) [19]. 

The proposed method has the ability to pick up anoma- 
lies (eg. skateboarder, bike) present even at the far end of 
the scene (eg. 2nd and 3rd images in column (a)). However, 
the last image in column (a) contains a 'miss': the biker 
was not detected. The cyclist was riding slowly and match- 
ing the pace of the neighbouring pedestrian (bottom-left 



corner). The texture in this context has strong vertical gradi- 
ents making the biker appear as a pedestrian. Using a more 
detailed texture descriptor may help in such cases. 

We also note a false positive (a pedestrian being detected 
as anomaly) in the last image of column (b). Upon further 
investigation, this false positive was due to the fact that the 
cells in the region of the pedestrian had minimal or no activ- 
ity during the training phase. Consequently any foreground 
object entering this zone during testing was considered as 
anomalous, irrespective of the observed features. 

4. Main Findings and Future Directions 

In this paper we have proposed an anomaly detection al- 
gorithm targeted towards crowded scenes. In addition to de- 
tecting anomalies based on motion, it inspects for anomalies 
occurring due to size and texture. Video images are initially 
segmented into foreground regions to confine the analysis to 
regions of interest, ignoring the background (which might 
be dynamic). Unlike most methods which compute the op- 
tical flow for all pixels or a fixed set of pixels, the flow is 
computed only for the foreground pixels, thereby achieving 
a more precise estimate of object motion. 

Features based on motion, size and texture are extracted 
at cell level (small fixed-size regions) and are modelled in- 
dependently for precise anomaly detection. Motion and size 
features are modelled by an approximated kernel density es- 
timation technique, which is computationally efficient even 
on large training datasets. The texture features are repre- 
sented by an adaptively grown codebook, which is gener- 
ated in an online fashion. 

Experiments on the recently published UCSD Anomaly 
dataset (containing annotated surveillance videos) show that 
the proposed method obtains better results than several re- 
cent methods: MPPCA, social force, mixture of dynamic 
textures (MDT). The proposed method attained consider- 
ably more accurate anomaly localisation than the next best 
performing method, MDT, while at the same time being 
several orders of magnitude faster than MDT. 

As part of future work, we aim to investigate the use 
of more descriptive features such as the orientation of mo- 
tion [15], which would allow the detection of events such 
as wrong- way traffic. It would also be useful to adaptively 
update the models over long periods of time, allowing for 
context changes (eg. dense traffic might be usual during the 
day, but it can be unusual at night). 

The effect of the cell size should be analysed in the pres- 
ence of object variations due to factors such as image res- 
olution, perspective changes, as well as view angle. The 
optimal cell size might be scene dependant and vary across 
the scene. For example, in Fig 3(a), a larger size might be 
more appropriate for the bottom-left corner (where objects 
appear relatively large), while a smaller size might be more 
effective in the top-right corner (where objects appear rela- 
tively small). 
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Frame-level anomaly detection on Ped2 
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Figure 4. ROC curves obtained on the UCSD Anomaly Detec- 
tion dataset, with the bottom-left corner representing ideal per- 
formance: (a) frame-level anomaly detection on the Pedl subset; 
(b) frame-level anomaly detection on Ped2; (c) within-frame 
anomaly localisation on Pedl. In all cases, the proposed method 
outperforms the other approaches at the equal error rate (EER) 
level, most notably on the anomaly localisation experiment. 
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